Increasing the attraction area of the global minimum in the binary optimization 

problem 
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The problem of binary minimization of a quadratic functional in the configuration space is dis- 
cussed. In order to increase the efficiency of the random-search algorithm it is proposed to change 
the energy functional by raising to a power the matrix it is based on. We demonstrate that this 
brings about changes of the energy surface: deep minima displace slightly in the space and become 
still deeper and their attraction areas grow significantly. Experiments show that this approach re- 
sults in a considerable displacement of the spectrum of the sought-for minima to the area of greater 
depth, and the probability of finding the global minimum increases abruptly (by a factor of 10'^ in 
the case of the 10 x 10 Edwards- Anderson spin glass). 
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I. INTRODUCTION 

Discrete optimization plays a central role in many en- 
gineering problems as well as in fundamental science. 
One major open problem concerns the nature of the en- 
ergy landscape of optimization problems. It is generally 
agreed that these energy landscapes are rugged. Finding 
the ground states of these disordered systems is generally 
an NP-hard problem so efficient algorithms are particu- 
larly called for 

The goal of this paper is to make a random-search 
procedure used in binary-minimization problems more 
effective. This sort of problem involves minimization of 
a quadratic functional Ei[S) built around a particular 
Nx N matrix T in an A^-dimensional configuration space 
of states 5 = (si, S2 , sn) where the discrete variables 
are Si — ±1 , i = 1, N. 

Minimization in a configuration space is quite different 
from minimization in a continuous-variable space where 
it can be solved with the help of conventional methods. 
Firstly, when we deal with binary variables, the quadratic 
functional has many extremes even if the matrix T is 
positive definite, the number of extremes growing expo- 
nentially with the dimensionality of the space. Secondly, 
the problem of binary minimization is NP-complete in 
the general case, i.e., it does not have a polynomial- 
time solution algorithm. For this reason heuristic meth- 
ods are used to solve it. Similarly to the alternating- 
variable (gradient) descent method, which can be ap- 
plied to functional minimization in a continuous space, 
the asynchronous dynamics of the Hopfield neural net 
0, Q is used in a discrete space. This kind of network 
keeps decreasing the value of the functional until a local 
minimum is reached. When we use a random-search al- 
gorithm, many randomly chosen initial states are tried to 
start the neural-net dynamics until a deep enough min- 
imum is reached. Because of the large number of local 
minima, this method does not work efficiently, though it 
is an acceptable solution. 



Despite these difficulties, heuristic methods have found 
wide use in binary-optimization problems. The success- 
ful solution of the traveling salesman problem Q with the 
Hopfield neural network introduced neural-net methods 
in graph theory Q , image processing Q and other fields 

mi- 

The reasons for such a successful application are 
studied in [T^ where the probability of finding a mini- 
mum in random search is shown to grow exponentially 
with the depth of the minimum. In particular, when a 
neural network is initialized at random, it is very likely 
to converge to a state corresponding to the global mini- 
mum or to a local minimum whose dep th is comparable 
to that of the global minimum [Uliil. The probability 
that the network converges to a relatively shallow local 
minimum is exponentially small. All this means that the 
neural network is most likely to find one of the subopti- 
mal solutions (local minimum) if not the optimal solution 
(global minimum). 

Usually, one tries to make the random-search proce- 
dure more efficient by changing the algorithm of the de- 
scent down the energy landscape described by functional 
Ej{S). A good review of these methods is given in 
|l9| . In contrast to that approach, we alter the energy 
landscape itself in order to increase the radius of the at- 
traction area of the global minimum (and other minima 
that are almost as deep as the global one) . Following the 
theory developed in [20| , we have considered the simplest 
kind of alteration which involves raising the matrix T to 
the power fc (fc = 2, 3, ... ). The approach proved quite 
effective: the change of the surface increases the proba- 
bility of finding the global minimum by 10'^ — 10"* times 
and makes the spectrum of the sought-for minima move 
far into a deeper-depth region. 

We substantiate the efficiency of this algorithm only 
for the case of matrices whose elements can be con- 
sidered as random independent variables (matrices of 
the Edwards- Anderson model, matrices of uniformly or 
normally distributed elements (Sherrington-Kirkpatrick 
model), etc.). Application of the algorithm for other 
types of matrix will be heuristic. 



2 



In the following section, we give the general framework 
in which we work. Section III postulates some necessary 
theorems concerning the depth and the shape of the lo- 
cal minima. Sections IV and V introduce the double 
descent algorithm with a matrix to the power k (DDK 
algorithm) and describe the main features and grounds of 
the energy landscape transformation. Finally, in Section 
VI we explain how the algorithm behaves in practice on 
test problems. 



II. STANDARD RANDOM-SEARCH 
ALGORITHM 

The problem of binary minimization can be formu- 
lated as follows: there is an iV x TV matrix T^, the 
goal is to find an A^-dimensional configuration vector 
= {s["'\^ 4") , 4™)), ^±l,^^T^N that 
ensures a minimum for the energy functional Ei{S): 

N N ^ 
£^l(<S) = -Ci^^TyS.Sj, Ci = ^^2— (1) 

i=l j=l 

where ci is just a normalization coefficient introduced 
to allow us to correctly compare the results for different 
matrices (see Appendix) ; ctt is the standard deviation of 
the elements of T. Without loss of generality, we assume 
that Tij is a symmetric zero-diagonal matrix (Ty = Tji , 
Tii — 0, Vi, J — 1, N) with zero mean value (Tij = 0). 

For minimization we use the Hopfield model 2] which 
is the basis of most of today's algorithms of binary op- 
timization. The model is a system of A'' spins whose in- 
teractions are governed by the energy functional Ei{S). 
Only conventional (asynchronous) behavior of the Hop- 
field model is considered: we compute the local field that 
acts on an arbitrary spin (i-th spin): 

N 

h^^^T.jSj. (2) 

If the spin is in an unstable state {hiSi < 0), the spin 
changes its state in accordance with the decision rule 
Si = sgnhi. The procedure is applied successively to 
all spins until the network comes to a stable state Sm- 
We should point out that in the asynchronous behavior 
the state of only one spin changes at each moment of 
time. This is in contrast to synchronous behavior where 
all spins change their states at the same time. The dy- 
namics under consideration (hi ~ ~dEi{S) / dsi) is no 
other than descending the energy surface Ei [S) which 
resembles the alternating-variable (gradient) descent in 
a continuous space. 

A decrease of energy accompanying each overturn of 
an unstable spin guarantees that the process described 
by asynchronous dynamics will bring the system to a sta- 
ble energy-minimum state after a finite number of steps. 
Of course, this minimum is very likely to be a local one 



Algorithm SRS {Standard Random Search) 
Begin 

Step 1. Random Initialization 

Initialize configuration 5 = at random 

Step 2. Descent over landscape £, from S to minimum 

Calculate 'l, =X^;-'j ^■'■^ i = l,N 

while there are unstable spins ( /z^s. < ) 
for each spin in S 
if h^s^ < then 

Refresh hj = hj + 2T.jS, for all 7 ?t / 
end if 
end for 
end while 

Repeat from Step 1 until the minimum of a required depth 
is reached or until the uptime ends. 
End Algorithm 

FIG. 1. Conventional random-search algorithm using the 
asynchronous neural-net dynamics 

despite our wish to find the deepest (global) minimum of 
the functional. That is why we have to turn to random 
search which involves descents from different random ini- 
tial configurations repeated until the minimum of a spe- 
cific depth (probably a global minimum) is reached. 

An example of a conventional random-search algo- 
rithm is presented in Fig. [T] Below we will denote it 
as SRS (Standard Random Search) and use it for com- 
parison with the algorithm we offer. Though the SRS 
algorithm is simple, the descent from a single random 
configuration requires about Osrs ~ 2A^^ operations 
for Sherrington-Kirkpatrick model and Osrs ~ 8iV for 
Edwards-Anderson model with AN nonzero matrix ele- 
ments Tij. Moreover, the number of local minima grows 
exponentially with N and therefore the probability to 
find the global minimum falls exponentially. The effi- 
ciency of the random-search procedure is usually charac- 
terized by three parameters: the probability of finding 
the global minimum, the search time for a given range of 
energies, and the average depth of sought-for minima. 



III. BASIC RELATIONS 

Before coming to the question of surface transforma- 
tion, let us define the basic relations involving the depth 
of a global (local) minimum. The first relation deals 
with the limitation on the depth of a minimum. Let 
Sq = {s^i\ S2°^ , sj^-*) be a configuration correspond- 
ing to global minimum Eq = Ei{So). Let us single out 
a term To from T. Tq is responsible for the minimum's 
appearing. For this purpose let us rewrite T as 

T = To+Tu (3) 

where 

To = roarSf^So, Ti = T - Tq. (4) 
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The weight tq can be found from the condition that there 
is no correlation between the elements of Tq and Ti . By 
calculating the covariance of these matrix elements and 
setting it to zero, we find that 



ro 



-En. 



(5) 



This expression establishes the relation between ro and 
the depth of the minimum Eq. The variances of the 
elements of Tq and Ti have the form CTq — r^a^ and 
a\ = it|i — CTq, which verifies that we managed to repre- 
sent a random matrix T as the sum of two independent 
matrices Tq and Ti. In addition, from - © it follows 
that SqTi Sq = 0. This relation evidences that the con- 
tribution which Ti makes to Eq is strictly zero, that is, 
the minimum at 5*0 is solely the contribution of matrix 
To. 

Let us follow [2l| and continue expanding ([3]). Let the 
configuration Si be an extreme of a quadratic functional 
built around Ti. Similarly to ([3]), let us present Ti as 
Ti = riaxS^ Si + T2 and define the statistical weight ri 
using zero covariance of the elements of S^ Si and T2. 
Repeating this procedure, we can get the matrix in the 
form of a series of weighted exterior products of random 
configuration vectors: 



T — ffT 



?n— 



= I. 



(6) 



All the conclusions drawn in with the aid of statisti- 
cal physics hold true for this sort of matrix. In particular, 
we can state that any of vectors S,n entering the expan- 
sion of T will be a minimum of the functional ^ only 
when its weight is greater than the threshold value 



1 



(7) 



where ac « 0.138 is the threshold value of the load pa- 
rameter [23|. First of all, the statement refers to the 
point So which is the minimum of the functional (1) by 
definition and for which the following relations are true 

1 > r-o > r„ -1 < -Bo < E,, E, = -r,. (8) 

The bounding of ro above is evident: when ro — > 1, we 
have T — Tq, i.e., T is formed as an exterior product of Sq 
and El [S) has a single minimum at point Sq. This limit is 
of no interest because the finding of its global minimum 
presents no difficulty. In most cases the minimization 
problem faces the situation when ro ~ Tc and Eq ~ Ec 
and the probability to find the global minimum is very 
low. It is this situation (ro ^ 1) that we will examine. 

The second necessary expression relates the depth and 
width of a minimum. As shown in IJ], the width of a 
minimum Eq increases with its depth. Correspondingly, 
the probability to find this minimum grows exponentially 
as P{Eq) ~ exp (^— N E^ / Eq) . Such a rigid connection 
between the depth of the minimum and the probability 



of finding it can be easily understood from the following 
consideration. The probability of finding a global mini- 
mum in the course of a random search is no other than 
the ratio between the number of points located within 
its attraction area and the total number of points (2^) 
in the A^-dimensional space. It is possible to show that 
the shape of the energy surface around the minimum is 
described [l6| by the expressions 



2n 

EQ{n) ^Eq[1- — 



(9) 



where i?o(n) and cro(") £^re the mean value and standard 
deviation of energy in the n- vicinity of the minimum (at 

(n) 

points Sq located distance n away from 5*0 according 
to Hamming). Expressions (|9]) are obtained in the limit 
N 00 from the exact expressions given in Appendix. 
The energy surface ^ is very smooth because the dis- 
persion of energy in the n- vicinity approaches zero with 
increasing dimensionality (cr ^ ^/N). Moreover, differ- 
ing in depth and width, all minima have the same shape 
determined by expression . As seen from ^ the width 
of a minimum is proportional to its depth Eq. This 
means that the number of points within the attraction 
area grows exponentially with the depth of the minimum. 

All of the above conclusions have to do with the config- 
uration 5*0 corresponding to the global minimum. How- 
ever, they are true for any extremum Sm of Ei{S): 
^ ^ fm 1^ fc and Ec > Em > —1 hold for minima, 
and Vc < — r^ < 1 and < E^ < 1 hold for maxima. 

The result is that we have determined two formulas: a) 
the deeper the minimum Eq, the greater the weight ro of 
an addition to configuration Sq in the initial matrix T and 
the higher the probability of finding the minimum, b) the 
point 5*0 can be a minimum only if ro > rc, that is, if the 
depth of the minimum \Eq\ is greater than the threshold 
value \Ec\. These formulas set the direction of our efforts 
to improve the efficiency of the random-search algorithm: 
the energy surface should be transformed in so way that 
the global minimum will become deeper and, therefore, 
the probability to find it will grow. This purpose can be 
reached by raising T to the power k as it is shown in the 
next section. 



IV. SURFACE TRANSFORMATION 

The surface defined by Ei{S) can be transformed only 
by changing the corresponding matrix. Let us put M — 
(1 - z)T + zT'' into (J]). Here, T'' is the matrix resulting 
from raising T to the power k (k — 2,3,...) and zeroing 
its diagonal elements. Let us pass from T to M = by 
varying a parameter z from to 1. Correspondingly, the 
surface determined by Ei (S) transforms into the surface 



4 



described by Ek{S): 

N N ^ 

EkiS) = -Cfc M,jSiSj,Ck = Jp^^ (10) 

i=l j=l 

where ctm is the standard deviation of the elements of 
M. Transformation of the global minimum is taken as the 
basis for all our considerations. It is clear that the surface 
transformation will shift the global minimum in the space 
and will change its depth and attraction area size. We 
will show below that when the exponent k is relatively 
small (2 < fc < 5) this transformation will change the 
minimum depth noticeably. The shift of the minimum is 
relatively small in case 2 < fc < 5. 

Correspondingly we offer a two-stage algorithm of min- 
imization. The first stage consists of a descent over the 
surface Ek(S) and finding a configuration which 
brings Ek{S) to a minimum. The second stage involves a 
correction: we descend over the surface Ei{S) from point 

Sm"^ to the nearest minimum Sm of Ei{S). The method 
of descending the surface Ek{S) is exactly the same as 
that described above: we compute the local field at the 
i-th spin: 

N 

h^^YM,,s,- (11) 

and when hi si < 0, the spin's state changes in accor- 
dance with the decision rule Si — sgn/i^-^\ 

Using this two-stage descent, we reahzed the random- 
search method whose formal algorithm is given in Fig. [2l 
Below we will refer to it as DDK (Double Descent algo- 
rithm with parameter k — 2,3, ... ) and compare its ef- 
ficiency with the conventional random-search algorithm 
SRS. To avoid misunderstanding, note that when k — 1^ 
the transformed functional Ek{S) is identical to the origi- 
nal one Ei{S)^ and the DDK method does not differ from 
the common SRS {DD 1 = SRS). 

The number of operations for the DDK algorithm is 
about twice that of SRS, i.e., Oddk ~ '^Osrsi but we 
also should remember that we have to spend some time 
(about N'^ operation) on matrix exponentiation. 

V. THE DDK ALGORITHM VALIDATION 

To justify the DDK algorithm, we turn to the case 
when k — 2. We will show that the surface transfor- 
mation makes the global minimum significantly deeper, 
while its position in binary space changes only slightly. 

A. The deepening of the minima 

Let us first make sure that the transformation of the 
surface results in the minima becoming deeper. Let us 



Algorithm DDK {Double Descent with parameter k=2,3,...) 
begin 

Step 1. Random Initialization 

Initialize configuration iS = (jj.Sj,..., 5^,) at random 
Step 2 . Descent over landscape E^^ from S to minimum S^^ 
Calculate Z^**"' = X'^y'^j ^'^■^ i = \,N 

while there are unstable spins [ /I'^'i', < ) 
for each spin in S 

if /7**'5, < then 
= — s, 

Refresh h^p = hf + 2M ^^s , for all j^i 

end for 
end while 

Step 3 . Correction : descent over landscape from Sj^^ to minimum 5^ 

Calculate h.=Y'^v^j ^■^'^ i = \,N at configuration S = Sj^^ 

while there are unstable spins ( h^s, < ) 
for each spin in S 
if h^s^ < then 

= —5, 

Refresh h^=h^ + 2T,^s, for all j^i 
end if 
end for 
end while 
end Algorithm 



FIG. 2. DDK algorithm of random search using a two-stage 
descent 

consider the energy i52o = E2{Sq) at the point Sq. Let 
us make use of ^ and represent the matrix Af = 
as M = + + (ToTi + TiTq). Then in view of the 
relations SqTi Sq = and ctm = \lNa'^ we get from 

N N 

E,o = -VneI + ^ Y Y (T?hsrhf\ (12) 

It follows from that when ^ 1, E2Q can be re- 
garded as a normally distributed quantity with mean E20 
and relatively small standard deviation as ■ 

E20 = -VneI (Tb - (1 - rl)/N. (13) 

Since E20/EQ = ToVN > r^^fN « 1.35, the minimum 
should be expected to become deeper after the surface 
transformation {E20 < Eq). The probability of that 
event is given by the expression: 

Pr{^20<^o} = ^(l + er/7), (14) 

where 

^ Eq - E2Q ^ roN{roVN-l) 
^ V2aE %/2(l-r2) 

From ro > it follows that 7 > 0.3%/F. This 
means that the probability of the minimum's deepen- 
ing Pr {iJ2o < Eq\ approaches unity with increasing N. 
In other words, the surface transformation is most likely 
to cause the minimum to deepen considerably (by a fac- 
tor of 1.35 on average), which, according to leads to 
the probability of finding the global minimum increasing 
exponentially with N . 
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It should be noted that £20 = —VNEq. This ahows 
the conclusion that E2{S) reaches its minimuni when we 
deal with both the configuration 5*0 which corresponds 
to the minimuni (Eq < 0) of Ei(S), and the configura- 
tion 5*0 which corresponds to the maximum {Eq > 0). 
Experiments confirm this fact. 



B. The shift of the minimum 

Now let us evaluate how far the minimum moves after 
the surface transformation. The average distance of the 
shift can be represented as dk — NP, where 



P = FT{sf>h. 



(0), W 



< I his,^ 



(0) 



>0}, 



(16) 



is the conditional probability of the spin sf''^ and the local 

(k) 

field h] having different directions given a configuration 
of minimum Sq. For our case of fc = 2 we can get from 
PT|) the following expression: 



wher 



£,2i 



(0)^0) 
3 ' 



JO) (0) 



(17) 
(18) 

(19) 
(20) 



are normally distributed quantities with zero means and 
standard deviations cr^. — ctti \/ N/2 and a^^i = '''li ■ 
This allows us to use the error function to express the 
probability p6|) . It can be shown that 



P 



1 



eTk{V2jx)dx, (21) 



where Pq = 1 + erf (7) and 7 = Vn tq . 

It is seen that the shift of the minimum d2 = NP is 
not large: given VNvq > 1.35, we get that c?2 < 0.026A^. 
It can be shown that in case fc = 3, the shift is even 
smaller. 



C. Conclusions and their verification 

The following conclusions can be drawn from expres- 
sions ([T2l) -(|2T |) . The surface transformation is most likely 
to make minima deeper and, therefore, the probability of 
finding them higher. Moreover, the greater the origi- 
nal depth |£'o|i the greater the average increase of the 
depth E20/EQ — ^/N \Eo\. In other words, deep min- 
ima become still deeper and the probability to find them 
becomes yet higher, while shallow minima become more 
shallow (if not even disappearing) and the probability to 




FIG. 3. The post-transformation deepening and standard 
deviation of the energy Ek{So) when the matrix is raised 
to a power k — 2, 3, 7. Edwards-Anderson matrices, 
iV = 100, 196, 484 



discover them become still lower. The above allows us to 
expect the spectrum of minima found with the given al- 
gorithm to move noticeably towards the global minimum, 
and the probability of finding it to increase considerably. 
These conclusions have been tested in a few numerical 
experiments. Fig. [3] shows the observed deepening of the 
global minima. As we see, global minima become deeper 
for fc = 2, 3, 4 and even for fc = 5, but with increasing 
dispersion. The most average deepening is observed for 
fc = 3. 

Since the deepening of minima should lead to an in- 
crease in the probability of finding them, the spectrum 
of the minima of the transformed functional should shift 
towards greater depths. To confirm this, Fig.|4]shows the 
spectral densities of the minima of two functional Ei{S) 
and £^2(5') built for N = 100. The density N{E) is de- 
fined in the following way: N{E)dE is the number of 
local minima found in the energy interval [E, E A- dE] 
and normalized to the total number of minima. We see 
that the spectrum of E2{S) is some distance to the left 
from that of the original functional Ei{S), and the min- 
imum at point has become noticeably deeper. The 
number of found minima of E2{S) is roughly half that of 
Ei{S). This shows that we have reached our ends: the 
transformation of the surface arranged that deep min- 
ima become yet deeper and are detected more frequently, 
while shallow ones are found less frequently or not at all. 

The shift of the minimum caused by the surface trans- 
formation is relatively small. From (I^TI) it follows that 
the smallest shifts are expected for the deepest min- 
ima. For example, in case N = 100, fc = 2 the experi- 
ment shows that for different matrices the shift of global 
minima varies from to 6 bits and its average value is 
d2 ~ 3.5 bits, which corresponds well to (|2Tt . Figure [5] 
shows the relation between the shift and transformation 
parameter fc. It is seen that the smallest shift occurs 
when fc = 3. 
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FIG. 4. Spectral density of minima of functionals Ei{S) and 
E2{S). Squares denote energies Ei{So) and E2{So) for con- 
figuration So which corresponding to the global minimum of 
functional Ei{S) 
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FIG. 5. Shift of the global minimum caused by the surface 
transformation. The curves are experimental data obtained 
with 2D Edwards-Anderson matrices at = 100, 196, 484 



We see that all the theoretical conclusions agree with 
the experimental results, providing good grounds for the 
use of the DDK algorithm. In the following sections we 
will give information on how to apply the algorithm for 
different types of matrices. 



VI. THE EFFICIENCY OF THE DDK 
ALGORITHM 

A. Results for Edwards- Anderson matrices 

The algorithm was tested using mostly matrices of 
the 2-dimensional Edwards-Anderson model with TV = 
10 X 10. This type of matrix was chosen for two reasons. 



Firstly, the standard random search procedure for this 
sort of matrix is not efficient: when N — 100, the proba- 
bility of finding the global minimum is less than 2-10^^. 
Secondly, in the case of Edwards- Anderson matrices it is 
possible to use the branch and bound method [lT| to find 
the global minimum 5*0, which allows us to control the 
efficiency of the process. 

We will determine the efficiency of the DDK algorithm 
with the help of three parameters: the probability of 
finding the global minimum, the difference between the 
average energy of the local minima Em and the energy 
Eq of the global minimum expressed by a dimensionless 
quantity of discrepancy SE = {Em — Eq^ /Eq, and the 
probability of entering the energy interval [Eq, 0.99i?o]- 

The DDK algorithm efficiency was evaluated for k from 
2 to 7. First of all we watched how the spectrum of lo- 
cal minima of Ei{S) determined by the DDK algorithm 
changes when the transformation parameter k grows. 
The processing of the data allowed us to obtain the prob- 
ability density P{E) where P{E)dE is the probability of 
finding minima in the energy interval [E^ E + dE]. Fig- 
ure [6] shows the probability density curves determined for 
a 2D Edwards- Anderson model with N = 100, the energy 
of the global minimum being taken equal to -1. It is seen 
that the spectrum of minima moves towards the location 
of the global minimum. Besides, even the probability of 
finding the global minimum goes noticeably further from 
zero. 

In addition, we have investigated how the type of the 
search algorithm affects the distribution of local minima 
around the global minimum. The distribution is governed 
by the probability Wk = Wk (d) which is computed by the 
DDK algorithm and determines the probability of finding 
a local minimum at a point distance d away from the 
global minimum. The magnitudes of Wk — Wk (d) do not 
carry useful information. For this reason Fig. [7] gives Wk 
normalized to probability Wsrs = WsRs{d) determined 
by using the SRS algorithm. It is seen that the use of 
the DDK algorithm increases the probability of finding 
a local minimum near the global one considerably. In 
particular, the probability of finding the global minimum 
(d — 0) grows more than three orders of magnitude when 
k grows. 

The experimental results are given in Table ID The 
first column holds data for the SRS algorithm. The other 
columns hold data for the DDK algorithm (for k = 2 — 5). 
To evaluate the efficiency of the algorithm, we picked 
three characteristics: the first row contains the deviation 
6E = [Em — Eq) /Eq] the second the probability of en- 
tering the energy interval [Eq, 0.99i?o] in close vicinity to 
the global minimum; the third the probability of hitting 
the global minimum. It is seen that the results for k = 5 
little differ from the results for /c = 3. When fc > 5, the 
results become worse. That is why a further increase of 
k does not make much sense: as k grows, the efficiency of 
the algorithm falls and computations begin to consume 
too much time. 
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TABLE I. Comparison of the DDK algorithm efficiency with the results of using the SRS algorithm performed for a 2D 
Edwards- Anderson model = 10 x 10 

SRS DD2 DD3 DD4 DD5 

Deviation 5E={E,^-Eo)/Eo 16% 11% 6% 10% 5% 

Probability of entering the energy 2.8 ■ 10~^ 2.3 ■ 10"^ 1.1 • 10"^ 5.6 ■ 10"^ 2.5 • 10~^ 
interval [Eq, 0.99Eo] 

Probability of hitting the global 2.4 • lO"'' 3.8-10"'' 2.1-10-^ 1.2-10-^ 4.1-10-^ 
minimum Pk{d = 0) 
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FIG. 6. PiE) is the density of the probability of finding 
minima of Ei (S) computed with the aid of the SRS algorithm 
and DDK algorithm for fc = 2, 3, 4, 5 



relatively high chances (about 1% for matrix dimension- 
ality N = 100) to find the global minimum with a conven- 
tional neural-net dynamics. However, it was interesting 
to check how efficiently the method under consideration 
works with this kind of matrices. 

Table im shows the results of the surface transformation 
obtained for Sherrington-Kirkpatrick matrices. We dealt 
with matrices of dimensionality = 100. It is seen that 
the raising-to-a-power trick also increases the minimiza- 
tion efficiency for this sort of matrix. Indeed, if we use 
the SRS algorithm, the probability of finding the global 
minimum is just below one percent, while the raising of 
the matrix to the third power (DD3 algorithm) allows a 
tenfold increase of this probability. 



Improvement of the dynamics. 
Houdayer-Martin algorithm 




0.00 0.05 0.10 d/N 



FIG. 7. Wk = Wk{d) is the probability to find minima dis- 
tance d from the global minimum. Wk is computed with the 
help of the DDK algorithm for fc = 2, 3, 4, 5 and divided by 
WsRS = WsRs{d) determined using the SRS algorithm 



B. Results for matrices of Sherrington-Kirkpatrick 
model 

Unlike sparse Edwards- Anderson matrices, full matri- 
ces with evenly (or gaussian) distributed elements allow 



It follows from the above that the increased efficiency 
of the algorithm is caused by a static change - the al- 
teration of the surface being descended. However, it is 
possible to modify the dynamics of descent. We found 
in experiments that the descent often stops in shallow 
pits just a few bits off the global minimum. In order to 
avoid this unfavorable effect, we changed the algorithm 
of descent. 

We borrowed a local optimization algorithm from [ill . 
This algorithm was inspired from the Kernighan-Lin [23| 
algorithm and in [l| it was adapted to our problem of 
quadratic functional minimization in binary space. This 
local search algorithm is similar to Hopfield dynamics. 
However it is more efficient due to its capability of going 
out of small cavities in the energy landscape. 

Here we use this algorithm in the form proposed in 
[l| . The number of operations that occur during the ex- 
ecution of one start of the Houdayer-Martin dynamics is 
about Ohm ~ IOOsrs- 

The use of this new Houdayer-Martin dynamics of de- 
scent instead of Hopfield neural net dynamics improved 
the efficiency of the DDK algorithm by an order of mag- 
nitude. The improved characteristics are shown in Ta- 
ble mil (the SRS and DDK algorithms using a modified 
Houdayer-Martin dynamics are designated as SRS-HM 
and DDK-HM here). 

Figure [5] gives the probability density curves P{E) 
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TABLE II. The results of using the algorithms SRS and DDK in the case of Sherrington-Kirkpatrick matrices, A'^ — 100 

SRS DD2 DD3 DD4 DD5 

Deviation (5£ = (S™ - £o)/£o 10% 5.4% 2.7% 4.8% 1.9% 

Probability of entering the energy 2.7 • 10"^ 9.3 ■ 10"^ 0.24 0.15 0.22 
interval [Eo, 0.99£ol 

Probability of hitting the global 0.01 0.04 0.12 0.07 0.08 
minimum Pk {d — 0) 



TABLE III. Efficiency of modified SRS and DDK algorithms with Houdayer-Martin dynamics of descent. Data correspond to 
a 2D Edwards-Anderson model with N = 100, 196 





SRS-HM 


DD2-HM 


DD3-HM 


DD4-HM 


DD5-HM 


TV = 100 












Deviation SE = {Em — Eo)/Eo 


4.7% 


4.0% 


2.6% 


4.0% 


2.5% 


Probability of entering the energy 


5.0% 


8.2% 


16.2% 


9.0% 


17.4% 


interval [Eo, 0.99£o] 












Probability of hitting the global 


1.4% 


2.8% 


6.0% 


3.9% 


5.9% 


minimum Pk {d — 0) 












TV = 196 

Deviation SE = {Em — Eo)/Eo 

Probability of entering the energy 
interval [Eo, 0.99SO] 
Probability of hitting the global 
minimum Pk {d — 0) 


5.9% 
0.2% 

4.7 ■ 10"^ 


4.8% 
0.5% 

2.2 ■ 10"* 


3.4% 
2.1% 

4.3 ■ 10"* 


4.7% 
0.6% 

3.6 ■ 10"* 


3.2% 
2.0% 

6.7-10"* 



when we use the modified search algorithms DDK-HM 
and SRS-HM. Comparing them with the curves in Fig. [51 
we can see that this modification of the descent dynam- 
ics brings about a significant shift of the spectral curves 
towards the global minimum. It is seen that due to this 
modification the probability of arriving at a global min- 
imum increased by another order of magnitude. The 
DD3-HM algorithm gives the best result: the probability 
to find the global minimum is 2.5 • 10"* times higher than 
for the SRS method. 

For Sherrington-Kirkpatrick matrices the gain of us- 
ing the DDK-HM algorithm instead of the SRS-HM al- 
gorithm becomes evident with increasing dimensionality 
TV. 



VII. DISCUSSION 

Let us again formulate the DDK algorithm. The 
preparatory stage involves three steps: 1) the original 
matrix T is made symmetric (if it is not so initially) and 
the diagonal elements are set to zero; 2) the matrix is 
raised to the kth power and the diagonal elements of the 
resulting matrix M = are set to zero; 3) Ek{S) is built 
using M and formula PU)) . The preparatory stage is fol- 
lowed by the random search procedure which employs 



P{E) 



0.10 - 



0.06 - 



0.00 




-1.05 -1 -0.95 -0.9 -0.85 -0.8 -0.75 E/\E„\ 

FIG. 8. Probability density P{E) found with the help of 
the modified search algorithms SRS-HM and DDK-HM for 
k — 2, 3, 4, 5. The dashed line corresponds to the probability 
density computed by the SRS method 



the two-stage descent algorithm: a) in the first stage 
we descend the surface Ek{S) from the randomly chosen 
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original configuration to the nearest local minimum Sm 
of Ei.{S); b) in the second stage we make a correction 
by going down the surface Ei(S) from Sm^ to the near- 
est local minimum Sm of Ei{S) which is usually located 
near point 5m • In both stages we can use any dynamics 
of descent (i.e., any local optimization procedure), e.g., 
Hopfield's or Houdayer-Martin's, discussed earlier. 

Comparison shows that the surface transformation 
increases the optimization algorithm efficiency signifi- 
cantly. In particular, for Edwards- Anderson matrices of 
size N = 196, the use of the DD3-HM method allows 
the following improvements: the probability of hitting 
the global minimum increases by order of magnitude in 
comparison with the SRS-HM algorithm; the difference 
between the average energy Em and the energy of the 
global minimum decreases by half; and the probability of 
getting the energy interval around the global minimum 
Ejn & [Eq, 0.99£'o] grows by order of magnitude. 

The analysis shows that the power k have an optimal 
value fc = 3, because the shift of global minimum is mini- 
mal (see Fig. [5]) and the deepening is maximal (Fig.[3l) at 
fc = 3. Wnen fc > 3 the dispersion of deepening increases 
and algorithm becomes unstable: when experimenting 
with DD5, we could not detect the global minimum for 
60% of matrices. That is why one should keep in mind 
that though Table |T] tells us about the higher efficiency 
of the DD5 algorithm than DD3, these are averaged data 
and the actual situation may be quite different for a par- 
ticular kind of matrix. 

Adding the Houdayer-Martin algorithm instead of the 
neural-net dynamics also improves the algorithm consid- 
erably. In particular, the DD3-HM algorithm allows hit- 
ting the global minimum with an average probability of 
about 6%, which is 30 times higher than the probabil- 
ity (~ 0.2%) provided by the DD3 algorithm and 25,000 
times greater than the probability (~ 2.4 • 10~^) permit- 
ted by the SRS algorithm. Introduction of a Houdayer- 
Martin approach increases the time of a single descent 
tenfold (see Table ITVT) . However, this dynamics adds sta- 
bility to the algorithm: there were not such matrices for 
which the DD3-HM algorithm failed to find the global 
minimum. 

The above allows the following conclusion: in all exper- 
iments the double-descent algorithm (DDK or DDK-HM) 
brings the system to a configuration that is very close to 
the global minimum in energy and distance with a very 
high probability, all characteristics of the algorithm being 
much better than those of the SRS algorithm. Besides, 
the superiority of the DDK (DDK-HM) algorithm over 
the SRS algorithm only grows with increasing dimension- 
ality N. The examination of the experimental data shows 
that the DD3-HM algorithm is the best for practical use: 
it exhibits the highest probability of finding the global 
minimum, the greatest stability, and the smallest search 
time. 



TABLE IV. Comparing the DDK methods in time consump- 
tion. TV = 100 



Method The average time per 1000 starts (s) 

SRS 0090 

DD2 0.175 

DD3 0.167 

DD5 0.170 

SRS-HM 0.950 

DD2-HM 1.571 

DD3-HM 1.211 

DD5-HM 1.152 
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Appendix: The shape of the energy landscape 

Consider an arbitrary point 5*0 = •S2°\ Stv"*) 

in A^-dimensional configurational space. The set of all 

points n — l^o"^! such that Sq"^ is located n bits away 

from 5*0 is called the n- vicinity of So. Let us show that 
as — > oo the dispersion of energy in the n- vicinity is 
negligible and the energy surface around 5*0 is determined 
by the mean value of the energy in the n-vicinity. 

The components of a vector Sq"-* that belongs to the 
n- vicinity and the components of 5*0 are connected by the 
relation: 

where a„ is a vector such that some N — n of its com- 
ponents equal -1-1 and the other n components equal — 1. 
Using this notation, the energy at the point Sq"^ is: 

Eoin) = -c5:5:T,-.r).f af^aj-^x.-, (A.l) 

i 3 

where Xij is the unit antisymmetric tensor. Further, the 
sum of the energies and the sum of squared energies over 
the set of all points of the n- vicinity is described by the 
expressions: 

N N 

Y^E,{n)^-cY^K.,T,,^^f^ (A.2) 

N 

Y^Elin) = -cY: A-„T.,T,,.f)4°).f .(°)(A.3) 
where 

Kij = '^o.iO.jXijyKijki = '^aia.jakaiXikXji- (A. 4) 
SI a 
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Summing (jA.4p over the sphere J7 = {Sn} and sub- 
stituting the result into (|A.2p and (|A.3p . we obtain the 
mean value and the dispersion of energy in the n- vicinity: 



Eoin) = Eo 



N{N- 1) ' 



(A.5) 



cr^(n) 



8n(iV - n) 



NiN - l)(7V-2)(iV-3) 



[A(l 



(A.6) 

where Eq — ^isf ^ is the energy of the system at 
S'o, hi is the local field acting on the z-th spin of the 
configuration 5'o, (Ths is the standard deviation of hisf\ 
and we use the following notation: 



A = 



4(n- 1){N -n-l) 
N{N - 1) ' 



B = 2N [{N - 2nf - (TV - 2)] 



(A.7) 



(A., 



According to (jA.5|) the mean value of the energy in the 
n-vicinity behaves parabolically. At n = the mean en- 
ergy equals Eq; with increasing n the mean energy Eoin) 
rises as if the configuration Sq were surrounded by a fun- 
nel with parabolic walls. At n = ^{N — Vn) the quantity 
Eo{n) vanishes and at n = N/2 the parabola reaches its 
peak. The further trend of the curve is completely sym- 
metric: with increasing n the mean energy goes down 



parabolically up to energy Eq a,t n = N when the config- 
uration turns into mirroring Sq — —Sq. 

Expressions (|A.5P - (jA.6|l are true for any point of the 
configuration space. However, Eo{n) describes more pre- 
cisely the shape of the energy surface around the min- 
imum: it follows from (|A.6p that the energy dispersion 
is minimal when So is the global minimum because ahs 
is minimal and Eq is maximal. The energy surface de- 
scribed by Eq^u) is very smooth because its roughness 
vanishes with increasing N {ao(n) ~ N~^). 

It is not hard to calculate ahs for any point of the space. 
But when calculating ahs for a minimum Sq we should 
take into account that hisf ^ > 0, Vi = 1, A''. Under this 

condition, we have 2N^al^ - 1 (the dispersion of hiSl 
at a minimum is half that at an arbitrary point). Using 
this relation and keeping only the first terms in (jA.SP - 
(|A.6p that are nondecreasing in the small parameter ^ 
0{N^^), we get the approximate expression 

Summing (IA.2l) - (jA.3p over all n-vicinities {n = 0,iV), 
i.e., over the whole space, we have a trivial result: the av- 
erage value of energy is zero {EQ{n) = 0) and the average 
square of energy is determined only by the average square 
of the matrix elements {Ef^{n) = 2c'^ N (N - l)jfj) , which 
coincides with the dispersion of matrix elements under 
the condition of zero mean value Tij. This explains the 
meaning of the normalization constant c in the energy 
functional: due to this normalization the energy value is 
independent of the sort of the matrix; so we can compare 
results for different matrices of the same type and also 
different sorts of matrices. 
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